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1.  INTRODUCTION: 


This  grant  utilizes  complimentary  approaches  to  improve  the  early  detection  of  lung  cancer.  Our  goal  is  to 
explore  whether  detection  of  DNA  methylation  changes  and  enhanced  CT  evaluations  will  add  to  the 
specificity  of  lung  cancer  detection.  This  was  defined  in  our  aims. 

Specific  Aim  1:  To  improve  the  clinical  utility  and  effectiveness  of  a  nested,  gel  based  DNA  methylation 
assay  for  sputum  and  plasma  by  increasing  its  sensitivity  and  specificity  through  nanotechnology. 
Hypothesis:  Detection  of  DNA  methylation  from  individuals  with  cancer  can  be  used  to  determine  lung 
cancer  risk  and  can  be  enhanced  through  discovery  of  optimal  hypermethylated  genes  and  implementation 
of  enhanced  detection  technologies. 

Specific  Aim  2:  To  use  an  in  vitro  molecular  testing  of  sputum  and  serum  with  DNA  methylation  rather  than 
simple  demographics  alone  to  select  the  highest  risk  smokers  for  an  expensive  screening  modality  such  as 
CT  scanning.  Hypothesis:  DNA  methylation  testing  is  more  specific  in  selecting  those  at  the  highest  risk  for 
lung  cancer  than  clinical  demographics  alone. 

Specific  Aim  3:  To  optimize  low  dose  chest  CT  screening  for  lung  cancer.  Hypothesis:  Valuable 
information  on  the  chest  CT  scan,  based  on  the  severity,  distribution,  and  pattern  of  low  attenuation  areas 
(“emphysema”),  may  be  crucial  to  increasing  our  insights  and  effectiveness  of  determining  lung  cancer  risk, 
the  frequency  of  follow  up  scans,  reducing  false  positives,  and  controlling  costs  compared  to  an  annual  chest 
CT  screening  for  the  sole  use  to  detect  lung  cancer  tumors  after  they  occur. 

2.  KEYWORDS: 

Lung  Cancer  Screening,  CT  Screening,  DNA  Methylation  Detection,  Emphysema  Score,  Lung  Airspace 
Variability  Score. 

3.  OVERALL  PROJECT  SUMMARY: 

During  the  first  two  years  of  this  proposal,  we  had  largely  accomplished  the  goals  of  building  an  improved 
method  for  methylation  detection  which  comprise  specific  aim  1.  We  made  significant  progress  on  the  two 
sub-aims  of  this  proposal  in  implementing  the  developments  from  last  year.  Last  year’s  progress  included 
A)  Developing  optimal  hypermethylated  gene  panels  for  detection  of  tumor  DNA  from  lung  cancer  and  B) 
Optimize  nanotechnology  based  detection  of  DNA  methylation  for  increased  sensitivity  and  specificity. 

The  first  efforts  were  initially  focused  on  the  development  of  an  optimal  gene  panel  for  detection  of  lung 
cancer.  After  completion  of  these  studies,  we  published  the  results  earlier  (1)  with  a  summary  provided 
here.  Hypermethylation  of  CpG  islands  is  a  common  and  important  alteration  in  the  transition  from  normal 
to  transformed  cells.  Following  previously  validated  methods  for  the  discovery  of  cancer-specific 
hypermethylation  changes  from  NSCLC  cell  lines,  we  identified  >300  candidate  genes.  Using  the  Cancer 
Genome  Atlas  (TCGA)  and  employing  extensive  filtering  to  refine  our  candidate  genes  for  the  greatest 
ability  to  distinguish  tumor  from  normal,  we  had  initially  defined  a  three-gene  panel,  CDOl,  HOXA9,  and 
TAC1,  which  we  subsequently  validate  in  two  independent  cohorts  of  primary  NSCLC  samples.  This  3- 
gene  panel  is  100%  specific,  showing  no  methylation  in  75  TCGA  and  7  primary  samples  and  is  83-99% 
sensitive  for  NSCLC  (shown  in  last  year’s  progress  report).  This  panel  has  been  further  expanded  through 
the  identification  of  additional  genes  with  extremely  high  methylation  frequencies  in  lung  cancer.  This  panel 
now  includes  three  additional  genes,  HOXA7,  SOX17  and  ZFP42,  for  which  real-time  MSP  analyses  assays 
were  also  developed  to  complement  the  previous  3  gene  panel  to  provide  redundant  tumor  coverage  to 
optimize  detection.  Our  subsequent  development  of  this  panel  expanded  into  6  genes  with  good  sensitivity 
and  specificity,  with  results  shown  for  5  genes. 
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Figure  1.  Methylation  of  CDOl,  HOXA9,  SOX17,  ZPF42  and  TAC1  is  Highly  Sensitive  for  NSCLC 
Detection,  in  stage  I  lung  cancer  samples.  Highly  prevalent  methylation  sites  were  chosen  from  data 
generated  within  the  TCGA  studies.  All  normal  lung  tissue  lack  DNA  methylation,  but  the  majority  of  lung 
tumors  have  methylation  of  individual  loci,  and  overall,  nearly  all  tumors  have  methylation  of  at  least  one 
loci  (adapted  from  Wrangle  (1)  . 

These  new  assays  were  confirmed  to  specifically  detect  abnormal  methylation  using  normal  lymphocytes 
and  in  vitro  methylated  bisulfite  converted  DNA.  We  found  high  specificity  to  methylation  in  bisulfite 
converted  DNA  and  no  amplification  in  unconverted  and  no  template  controls.  These  new  assays  were 
deployed  in  specific  aim  2  with  good  results. 

Aim  2:  The  use  of  methylated  tumor- specific  circulating  DNA  has  shown  great  promise  as  a  potential  cancer 
biomarker.  Nonetheless,  the  relative  scarcity  of  tumor- specific  circulating  DNA  presents  a  challenge  for 
traditional  DNA  extraction  and  processing  techniques.  We  completed  a  study  of  improvements  in  DNA 
processing,  with  a  single  tube  extraction  and  processing  technique  dubbed  “methylation  on  beads”  that 
allows  for  DNA  extraction  and  bisulfite  conversion  for  up  to  2  ml  of  plasma  or  serum  (Outline  of  approach 
shown  in  figure  2)  (2).  In  comparison  to  traditional  techniques  such  as  phenol  chloroform  and  alcohol 
extraction,  methylation  on  beads  yields  a  1.5  to  5 -fold  improvement  in  extraction  efficiency.  The  greatest 
enhancement  in  extraction  efficiency  is  seen  with  small  amounts  of  DNA,  precisely  matching  the  need  for 
improved  extraction  in  low  DNA  content  samples  such  as  plasma  and  serum.  A  summary  of  the  final  results 
using  this  approach  is  provided  in  figure  3. 
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LUAD  and  LUSC.  Binary  methylation.  Stage  I  samples 
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Figure  2.  Overview  of  the  Methylation- 
on-Beads  (MOB)  Process.  Circulating 
DNA  from  up  to  2  ml  of  plasma  is 
extracted  and  purified  via  SSBs.  The 
purified  DNA  is  then  subject  to  bisulfite 
conversion  and  analyzed  via 
methylation  specific  PCR  (MSP).  The 
entire  sample  preparation  process  can  be 
performed  in  a  single  tube  and  consists 
of  an  iterative  process  of  adding 
reagents,  magnetic  decantation,  and 
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removal  of  supernatant. 

Figure  3  B-Actin  Ct  values  for  MOB 
processed  vs.  Phenol  Chloroform  extracted  and  traditionally  processed  plasma  samples  from  24  patients 
diagnosed  with  lung  cancer.  The  MOB  technique  demonstrates  consistently  higher  and  less  variable 
recovery,  as  demonstrated  by  the  lower  average  Ct  value  (33.8  vs.  40.6  cycles)  and  Ct  standard  deviation 

(0.3  vs.  1.9  cycles),  respectively.  rru:'’ : -  *■ £  °  ~’"'1 - "  °6'8- 

in  amplifiable  DNA,  on  average. 


This  improvement  in  Ct  of  6.8  cycles  represents  a  2  '  or  11 1  fold  increase 
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Having  developed  an  optimal  panel  and  improved  upon  methods  for  processing  the  DNA  as  planned,  we 
have  applied  these  techniques  to  the  plasma  and  serum  of  patients  with  CT  detected  lung  cancer  and  those 
with  non-cancerous  nodules,  and  have  completed  the  writing  of  a  manuscript  containing  these  results.  The 
abstract  of  that  manuscript  is  as  follows  (full  manuscript  in  appendix): 

Purpose: 

To  improve  the  diagnostic  accuracy  of  lung  cancer  screening  using  ultrasensitive  methods  detecting  gene 
promoter  methylation  in  sputum  and  plasma  using  Methylation-On-Beads  (MOB)  with  a  lung  cancer 
specific  gene  panel. 

Patients  and  Methods: 

This  is  a  case-control  study  of  subjects  with  nodules  suspicious  for  lung  cancer  on  CT  imaging  in  which 
plasma  and  sputum  were  obtained  pre-operatively.  Cases  (n=150)  had  pathological  confirmation  of  node 
negative  (stage  IA,  IB  and  IIA)  non- small  cell  lung  cancer  while  controls  (n=60)  had  non-cancer  diagnoses. 
We  detected  promoter  methylation  using  quantitative  methylation  specific  real-time  PCR  with  MOB  for 
cancer-specific  genes  (CDOl,  TAC1,  HOXA7,  HOXA9,  SOX17  and  ZFP42)  identified  from  The  Cancer 
Genome  Atlas  (TCGA). 

Results: 

DNA  methylation  was  detected  in  plasma  and  sputum  more  frequently  in  people  with  cancer  compared  to 
controls  (pcO.OOl)  for  5  of  6  genes  examined.  Individual  gene  detection  The  sensitivity  and  specificity  for 
lung  cancer  diagnosis  using  individual  genes  from  sputum  ranged  from  63-93%  and  42-92%  respectively 
and  from  plasma  from  33-91%  and  52-94%.  A  three-gene  combination  including  the  best  individual  genes 
has  sensitivity  and  specificity  of  93%  and  79%  using  sputum  and  91%  and  64%  using  plasma.  Area  under 
the  Receiver  Operating  Curve  for  this  panel  was  0.89  95%  Cl  (0.80-0.98)  in  sputum  and  0.77  95%  Cl  (0.68- 
0.86)  in  plasma.  Independent,  blinded  random  forest  prediction  models  combining  gene  methylation  with 
age,  pack- year,  COPD  status  and  FVC  values  correctly  predicted  lung  cancer  in  91%  of  subjects  using 
sputum  samples  and  85%  of  subjects  using  blood  samples. 

Conclusions: 

High  diagnostic  accuracy  for  early  stage  lung  cancer  can  be  obtained  using  methylated  promoter  detection  in 
sputum  or  plasma. 
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Table  1.  Baseline  Characteristics  of  the  210  Subjects. 


Patient  Characteristics 

Cancer 

(N=150) 

Control 

(N=60) 

p  Value 

Age  at  surgery  (years)  (IQR) 

68  (62-75) 

63  (55-73) 

0.007 

Gender 

Male  (%) 

63  (42%) 

33  (55%) 

0.094 

Female  (%) 

87  (58%) 

27  (45%) 

Race 

White  (%) 

120  (80%) 

51  (85%) 

Black  (%) 

19  (13%) 

3  (5%) 

0.087 

Other  (%) 

11  (7%) 

6  (10%) 

Stage 

IA-IB  (%) 

136  (91%) 

NA 

NA 

IIA  (%) 

14  (9%) 

NA 

Histology 

Adenocarcinoma  (%) 

121  (81%) 

NA 

Squamous-cell  (%) 

26  (17%) 

NA 

NA 

Adenosquamous  (%) 

3  (2%) 

NA 

Smoking  status 

Current  (%) 

27  (18%) 

7  (12%) 

Former  (%) 

87  (58%) 

34  (57%) 

0.176 

Never  (%) 

31  (21%) 

19  (32%) 

Pack-year  (IQR) 

30  (10-50) 

20  (0-35) 

0.010 

COPD  (%) 

41  (27%) 

12  (20%) 

0.370 

FEV1  %  Predicted  (IQR) 

84  (70-99) 

85  (70-100) 

0.861 

FVC  %  Predicted  (IQR) 

92  (80-103) 

87  (80-110) 

0.682 

FEV1/FVC  %  Ratio  (IQR) 

73  (68-78) 

77  (70-79) 

0.080 

Nodule  size  (cm) 

2  (1.5-3) 

1.5  (1.1-3) 

0.01 

<  1cm 

6  (4%) 

13  (22%) 

1-2  cm 

52  (35%) 

19  (32%) 

0.001 

>  2  cm 

92  (61%) 

28  (47%) 

Nodule  volume  (cm3) 

4.19(1.77-14-14) 

1.6  (0.52-18.12) 

0.001 

Abbreviations :  Chronic  obstructive  pulmonary  disease:  COPD,  Forced  Expiratory  Volume  in  one  second:  FEV1,  Forced  vital 
capacity:  FVC,  Interquartile  range:  IQR.  Nodule  size  %  <lcm,  1-2,  >2cm 
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Methylation  was  readily  detected  for  these  loci  in  the  majority  of  cancer  patients,  but  not  in  most  control 
patients.  Actual  quantitation  of  the  methylation  was  carried  out  with  the  following  DNA  Methylation  analysis: 
The  genomic  sequence  for  the  genes  and  1000  bases  upstream  was  obtained  from  the  UCSC  genomic  browser  website.  The 
primers  and  hybridization  probes  for  methylation  analysis  were  designed  based  on  this  sequence  by  using  Primer3  (v. 0.4.0).  The 
analysis  was  performed  using  quantitative  real-time  Methylation  Specific  PCR,  and  normalized  to  a  control  (5-Actin  assay.  Each 
reaction  was  performed  in  a  25  pi  PCR  mixture  consisting  of  2  pi  of  bisulfite  converted  DNA,  300  nM  R-sense  primer,  300  nM  F- 
anti-sense  primer,  lOOnM  probe,  100  nM  of  fluorescein  reference  dye  (Life  Technologies),  1.67mM  dNTPs  (VWRQuotation),  and 
1  ul  of  Platinum  Taq®  DNA  Polymerase  (invitrogen).  Master  mix  contained  16.6mM  (NH4)2S04,  67mM  Tris  pH  8.8,  6.7  mM 
MgCf  and  lOmM  /Tmercaptoethanol  in  a  nuclease-free  DI  water  solution.  Amplification  reactions  were  performed  using  96  well- 
plates  (MicroAmp®)  with  all  samples  being  analyzed  in  triplicate.  Thermo  cycling  conditions  were  as  follows:  95°C  for  5  min,  50 
cycles  at  95°C  for  15  seconds,  and  65°C  for  lmin  and  72°C  for  1  min.  An  ABI  StepOnePlus  Real-Time  PCR  system  was  used 
(Applied  Bio  Systems). 

With  the  extremely  low  levels  of  DNA  methylation  in  plasma  and  sputum,  replicates  for  some  samples  produced  no 
detectable  methylation  as  expected.  To  incorporate  this  information  into  the  final  quantification  of  methylation,  we  calculated  the 
2"act  for  each  methylation  detection  replicate  comparing  it  to  the  mean  Ct  for  P-Actin  (ACTB).  For  replicates  which  were  not 

ACT  ACT 

detected  (ND),  a  CT  of  100  was  used,  creating  a  near  zero  value  for  2"  .  The  mean  2"  value  was  calculated  with  the  formula: 

n-ionpiam  <  T-flgTaanaBe  a  ■  *»— antraBaa^w  av 
„  -J-4CT  _  Li  '  +  f  '  +  f  ' 

3 

The  results  of  the  methylation  analyses  for  these  six  genes  in  the  210  patients  are  shown,  with  tumor 
methylation,  sputum  and  plasma  results. 


Figure  4.  Methylation  level  (Normalized  to  beta  actin)  for  6  genes  detected  in  tumor,  plasma  (blood) 
and  sputum  from  patients  with  Lung  cancer  and  non-cancer  controls,  plotted  on  log  scale.  Each  dot 
represents  the  calculated  level  of  methylation  using  the  formula  above  from  triplicates.  Note  the  nearly 
universal  detection  of  methylation  in  all  tumor  tissues,  and  at  higher  quantitative  levels  than  seen  in  biologic 
fluids  (as  expected  given  the  relative  amounts  of  tumor  DNA  in  tissue  samples  compared  to  plasma  or 
sputum).  Plasma  and  sputum  samples  vary  in  quantity  of  methylation  from  equal  to  that  in  tumor  to  very 
low  level  detection  (10-5-10-6)_This  low  level  detection  was  not  possible  without  the  integrated  methods 
described  for  this  proposal 
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With  these  results,  we  calculated  the  analytic  accuracy  of  methylation  detection. 

Table  2.  Gene  Methylation  Sensitivity,  Specificity,  AUC  and  Association  with  Cancer 
Diagnosis  for  genes  obtained  from  Sputum  and  Blood. 


Sputum 

Sensitivity 

Specificity 

PPV 

NPV 

AUC 

95%  Cl 

CDOl 

78% 

67% 

90% 

45% 

0.70 

(0.57  -  0.84) 

TAC1 

84% 

79% 

94% 

57% 

0.84 

(0.74  -  0.94) 

HOXA7 

63% 

92% 

97% 

40% 

0.77 

(0.67  -  0.86) 

HOXA9 

77% 

42% 

83% 

32% 

0.56 

(0.41  -  0.69) 

SOX17 

84% 

88% 

96% 

59% 

0.84 

(0.75  -0.94) 

ZFP42 

88% 

62% 

90% 

58% 

0.73 

(0.60  -  0.87) 

TAC1,  HOXA7,  SOX  17 

93% 

79% 

94% 

75% 

0.89 

(0.80  -  0.98) 

Blood 

Sensitivity 

Specificity 

PPV 

NPV 

AUC 

95%  Cl 

CDOl 

65% 

74% 

86% 

46% 

0.68 

(0.58  -  0.77) 

TAC1 

76% 

78% 

90% 

57% 

0.78 

(0.70  -  0.86) 

HOXA7 

33% 

94% 

93% 

36% 

0.60 

(0.51-0.69) 

HOXA9 

81% 

52% 

81% 

52% 

0.62 

(0.52  -  0.73) 

SOX17 

71% 

86% 

93% 

54% 

0.78 

(0.70  -  0.86) 

ZFP42 

81% 

58% 

83% 

55% 

0.66 

(0.56  -  0.75) 

CDOl,  TAC1,  SOX  17 

91% 

64% 

86% 

74% 

0.77 

(0.68  -  0.86) 

Abbreviations :  area  under  the  curve  (in  the  ROC  curves):  AUC,  95  %  confidence  interval:  95%  CL 
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Figure  5.  Receiver  operator  classification  curves 
for  lung  cancer  detection. 

A.  ROC  curves  comparing  the  3  genes  with  the 
largest  areas  under  the  curve  for  sputum.  B.  ROC 
curves  comparing  the  3  genes  with  the  largest  areas 
under  the  curve  for  blood.  C.  ROC  of  the  combined 
methylation  status  of  the  genes  from  sputum  with 
the  largest  area  under  the  curve.  D.  ROC  of  the 
combined  methylation  status  of  the  genes  from 
blood  with  the  largest  area  under  the  curve. 

Abbreviations :  area  under  the  curve:  AUC,  95  % 
confidence  interval:  95%  CL 

Independent  Prediction  Accuracy 
Performance:  While  the  above  analysis 
looked  at  individual  gene  methylation  in 
cases  and  controls  to  detect  cancer, 
independent  blinded  random  forest 
prediction  models  analyzed  all  these 
biomarkers  in  combination  with  clinical 
risk  factors.  Risk  factors  included  in  the 
first  two  random  forest  prediction  models 
were  methylation  Ct  values  from  all  six 
genes,  age,  pack- year,  COPD  status  and 
FVC  values.  The  methylation  Ct  values 

were  not  included  in  the  last  prediction  model.  The  randomly  selected  training  dataset  has  140  subjects  with 
99  (70.7%)  cancers  and  41  (29.3%)  controls.  The  independent  test  set  has  70  subjects  with  51  (72.9%) 
cancers  and  19  (27.1%)  controls.  In  the  variable  of  importance  output  of  the  first  two  random  forest 
prediction  models,  methylation  Ct  values  were  ranked  as  more  important  variables  than  demographic  and 
clinical  variables  (Figure  6).  Table  3  summarizes  the  prediction  accuracies  of  these  three  models  when  they 
were  applied  to  the  independent  test  set  patients.  With  sputum  samples,  the  random  forest  model  correctly 
predicted  lung  cancer  in  91%  of  subjects  in  the  test  subset.  The  corresponding 
AUC  was  0.85  95%  Cl  (0.59-1.0)  .  The  sensitivity  and  specificity  of  the 
prediction  in  the  testing  subset  from  the  ROC  curve  were  0.93  and  0.86, 
respectively.  Using  plasma  samples,  the  random  forest  model  correctly 
predicted  lung  cancer  in  85%  of  subjects  in  the  testing  subset.  The 
corresponding  AUC  was  0.89  95%  Cl  (0.79-0.99).  The  sensitivity  and 
specificity  of  the  prediction  in  the  testing  subset  from  the  ROC  curve  were 
0.93  and  0.67,  respectively.  Using  clinical  and  demographic  risk  factors 
alone,  the  accuracies  were  lower  than  the  first  two  models  with  a  diagnostic 
accuracy  of  68%,  AUC  of  0.64,  PPV  of  75%  and  a  NPV  of  38%  (Table  3). 


False  positive  rate 


SOX  17  Sputum 
TAC  1  Sputum 
ZFP42  Sputum 
HOXA7  Sputum 
CDOI  Sputum 
Age  at  surgery 
HOXA9  Sputum 
Pack  Years 


COPD 


Figure  6.  Variable  importance  plot  for  random  forest  prediction.  The 

plot  details  the  relative  importance  of  each  of  the  variables  to  the  model’s 
accuracy  (including:  methylation  p  2-ACT  values,  nodule  size,  age,  pack- 
year,  COPD  status  and  FVC  values).  The  x-axis  is  the  mean  decrease  in  the 
Gini  co-efficient  that  results  when  that  variable  is  included  in  the  model.  The 
Gini  coefficient  is  a  measure  of  inequality  among  the  trees  in  the  random 
forest,  and  in  this  case  represents  the  performance  of  the  random  forest  model 
with  and  without  a  variable  included.  Those  variables  that  have  the  highest 
decrease  in  the  Gini  coefficient  were  most  likely  to  create  consensus  among 
the  individual  decision  trees  used  in  the  model  (or  reduce  inequality)  when 
included  in  the  model.  These  variables  are  therefore  most  predictive  of  the 
outcome  of  the  model  overall.  Those  variables  with  a  small  decrease  in  the 
mean  Gini  coefficient  are  relatively  less  important  to  the  prediction  made  by 
the  random  forest  model. 
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Table  3.  Performance  for  lung  cancer  diagnosis  of  the  independent  blinded  random 
forest  prediction  models  on  the  testing  subset 


Sensitivity 

Specificity 

PPV 

NPV 

AUC 

95%  Cl 

Prediction  from  Sputum 

93% 

86% 

96% 

75% 

0.85 

0.59-1 

Prediction  from  Blood 

93% 

67% 

87% 

80% 

0.89 

0.79-0.99 

Clinical  Predictors  alone 

84% 

26% 

75% 

38% 

0.64 

0.50-0.78 

Abbreviations:  area  under  the  curve  (in  the  ROC  curves):  AUC,  95  %  confidence  interval:  95%  Cl. 

In  the  final  period  of  our  funding,  the  PI  (Dr.  Herman)  moved  to  the  University  of  Pittsburgh  and  has  begun 
to  implement  this  approach  in  samples  from  the  Lung  Cancer  Pittsburgh  Screening  study  (PLuSS)  and  the 
Pittsburgh  Lung  Cancer  SPORE.  This  will  form  a  validation  cohort  and  be  used  to  further  improve  this 
already  promising  approach. 

For  specific  aim  3,  to  date  we  were  able  to  identify  210  subjects  in  the  SPORE  database  that  had  CT  scans 
performed  prior  to  surgery  which  were  adequate  for  analysis.  We  have  completed  measurement  of  the 
extent  of  computed  tomography  (CT)  in  these  subjects.  Of  the  group,  168  of  the  subjects  had  cancer,  and  42 
did  not. 

The  software  can  divide  the  lung  into  upper,  middle,  and  lower  fields  on  the  right  and  the  left  for  a  total  of 
six  lung  areas.  For  the  subjects,  clearly  abnormal  areas  were  eliminated  from  further  analysis.  For  the  Ca+ 
subjects,  the  final  usable  number  of  lung  fields  were  right  upper=106,  right  middle=l  11,  right  lower=108, 
left  upper=l  18,  left  middle=l  18,  and  left  lower=l  16.  For  the  Ca-  subjects,  we  have  103  for  each  lung  field. 
The  emphysema  score  was  based  on  the  number  of  voxels  with  Hounsfield  units  (HUs)  less  than  -910.  The 
percent  emphysema  of  the  lungs  ranged  from  0.19  to  56%  among  all  the  subjects  with  a  mean  score  of 
28.8+15%  (mean±SD).  The  subjects  with  and  without  cancer  had  a  similar  amount  of  emphysema  (29+15 
and  27+14  respectively  (p=0.42)  .  This  suggests  that  simple  screening  for  emphysema  would  not  allow  for 
detection  of  lung  cancer. 

We  continued  the  study  of  CT  images,  examining  lung  heterogeneity  by  comparing  the  ratio  of  the  upper  to 
the  lower  lung  in  regards  to  the  mean,  standard  deviation,  <-950  HU,  and  15th  percentile  of  the  CT  scans 
comparing  cancer  and  non-cancer.  The  software  divides  the  scans  into  right  and  left,  allowing  the 
examination  of  these  4  variables  for  each  side  separately.  The  only  variable  that  was  significant,  and  was 
significant  on  both  the  right  (p=0.0091)  and  left  (p=0.0454)  was  the  15th  percentile.  The  mean  HU  value  for 
the  right  side  was  close  (p=0.0785). 

We  next  combined  the  right  and  left  sides  to  have  a  single  overall  measure.  (Figure  7)  The  average  (right- 
left)  upper  to  lower  ratios  for  the  mean  CT  score,  standard  deviation  of  the  mean,  <-950  HU,  and 
15th  percentile  of  the  CT  scans  comparing  the  cancer  to  the  non-cancer  subjects.  Again  the  average  (right- 
left)  upper  to  lower  ratio  for  the  15th  percentile  was  significant  (p=0.0014).  In  addition,  the  average  (right- 
left)  upper  to  lower  ratio  for  the  mean  CT  density  was  also  significant  (p=0.04)  (Figure  8).  Neither  the 
standard  deviation  (p=0.43)  or  the  <-950HU  (p=0.26)  measurements  were  significant. 
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Figure  7  Ratio  (U/L)  Mean  CT  density 


t  Test;  Assuming  equal  variances 


Difference  0.025567  t  Ratio  2.066335 

Std  Err  Dif  0.012373  DF  208 

Upper  CL  Dif  0.049961  Prob  >  Itl  0.0400* 

Lower  CL  Dif  0.001 174  Prob  >  t  0.0200* 

Confidence  0.95  Prob  <  t  0.9800 

These  findings  suggest  that  the  upper  to  lower  ratio  of  the  mean  intensity  may  be  an 
independent  predictor  of  lung  cancer. 


Figure  8.  Ratio  (U/L)  15th  percentile 
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t  Test;  Assuming  equal  variances 

Difference  0.021611  t  Ratio  3.228527 

Std  Err  Dif  0.006694  DF  208 

Upper  CL  Dif  0.034807  Prob  >  Itl  0.0014* 

Lower  CL  Dif  0.008415  Prob  >  t  0.0007* 

Confidence  0.95  Prob  <  t  0.9993 

These  findings  suggest  that  the  upper  to  lower  ratio  of  the  15th  percentile  of  lung  intensity  may  also  be  an 
independent  predictor  of  lung  cancer. 

Considering  these  two  significant  variables  into  a  single  Generalized  Linear  Model  (GLM)  as  independent 
variables  and  cancer  as  the  outcome  variable,  i  In  this  model,  only  the  15th  percentile  was  significant 
(p=0.0055). 

Generalized  Linear  Model  Fit 

Response:  Cancer 
Modeling  P(Cancer=NO) 

Distribution:  Binomial 
Link:  Logit 

Estimation  Method:  Maximum  Likelihood 
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Observations  (or  Sum  Wgts)  =  210 


Whole  Model  Test 
Model  -LogLikelihood 


Difference 

Full 

Reduced 


6.26033276 

98.8241762 

105.084509 


L-R 

ChiSquare 

12.5207 


DF  Prob>ChiSq 

2  0.0019* 


Effect  Tests 

Source  DF 

R+L  U/L  mean  1 

R+L  U/L  15%  1 

Parameter  Estimates 
Term 

Intercept 
R+L  U/L  mean 
R+L  U/L  15% 


L-R  Prob>ChiSq 
ChiSquare 

0.6205194  0.4309 

7.7075925  0.0055* 


Estimate 

18.243874 

3.6291227 

-23.20951 


Std  Error 

6.0817368 

4.568479 

8.6952667 


L-R 

ChiSquare 

10.530641 

0.6205194 

7.7075925 


Prob>ChiSq 

0.0012* 

0.4309 

0.0055* 


This  analysis  suggests  that  only  the  upper  to  lower  ratio  of  the  15th  percentile  of  lung  intensity  may  be  a 
predictor  of  lung  cancer. 


KEY  RESEARCH  ACCOMPLISHMENTS: 

•  Completion  of  sputum  and  plasma  analysis  from  210  subjects  in  a  case  control  study  of  150 
with  early  stage  lung  cancer  and  60  non  cancer  controls. 

•  Demonstration  of  specific  and  sensitive  detection  of  cancer  specific  DNA  methylation  as  an 
early  detection  biomarker. 

•  Transition  of  Studies  to  the  University  of  Pittsburgh. 

•  Studies  of  emphysema  and  variability  scores  completed  127  subjects  with  a  diagnosis  of  lung 
cancer  and  180  subjects  without  a  diagnosis  of  lung  cancer. 


4.  CONCLUSION: 


In  summary,  based  on  our  previous  development  of  an  improved  panel  of  genes  hypermethylated  in  lung 
cancer,  with  extraordinarily  high  specificity  and  sensitivity,  we  combined  the  improved  methods  of  MOB 
with  highly  sensitive  methylation  specific  PCR  assays  suitable  for  biologic  fluid  testing  (sputum  and  serum) 
and  completed  the  study  of  a  cohort  of  cancer  positive  and  negative  samples.  In  combination  with  these 
molecular  detection  approaches,  we  have  examined  the  alterations  in  air  space  for  improving  detection  of 
lung  cancer  and  find  that  variability  of  air  spaces  is  associated  with  the  presence  of  lung  cancer.  We  have 
during  the  period  of  this  grant  developed  a  highly  sensitive  and  specific  method  for  early  detection  of  lung 
cancer. 

6.  PUBLICATIONS,  ABSTRACTS,  AND  PRESENTATIONS: 

2015  Meeting  of  the  American  Association  for  Cancer  Research  (AACR)  in  Philadelphia,  Pennsylvania 
2015  Meeting  of  the  International  Association  of  Lung  Cancer  (IASLC)  in  Denver,  Colorado 

7.  INVENTIONS,  PATENTS  AND  LICENSES:  Early  Detection  of  Lung  Cancer  Using  DNA 
Methylation  in  Plasma  and  Sputum,  JHU  Cl 3599,  licensing  being  pursued  with  Cepheid. 

8.  REPORTABLE  OUTCOMES:  Nothing  to  report 

9.  OTHER  ACHIEVEMENTS:  Nothing  to  report 
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Early  Detection  of  Lung  Cancer  using  DNA  Promoter 
Hypermethylation  in  Plasma  and  Sputum 

Abstract: 

PurposefAHI  ]: 

To  improve  the  diagnostic  accuracy  of  lung  cancer  screening  using  ultrasensitive  methods  detecting 
gene  promoter  methylation  in  sputum  and  plasma  using  Methylation-On-Beads  (MOB)  with  a  lung 
cancer  specific  gene  panel. 

Patients  and  Methods: 

This  is  a  case-control  study  of  subjects  with  nodules  suspicious  for  lung  cancer  on  CT  imaging  in 
which  plasma  and  sputum  were  obtained  pre-operatively.  Cases  (n=  150)  had  pathological  confirmation 
of  node  negative  (stage  IA,  IB  and  IIA)  non-small  cell  lung  cancer  while  controls  (n=60)  had  non¬ 
cancer  diagnoses.  We  detected  promoter  methylation  using  quantitative  methylation  specific  real-time 
PCR  with  MOB  for  cancer-specific  genes  (CDOl,  TAC1,  HOXA7,  HOXA9,  SOX17  and  ZFP42) 
identified  from  The  Cancer  Genome  Atlas  (TCGA). 

Results: 

DNA  methylation  was  detected  in  plasma  and  sputum  more  frequently  in  people  with  cancer  compared 
to  controls  (p<0.001)  for  5  of  6  genes  examined.  Individual  gene  detection  The  sensitivity  and 
specificity  for  lung  cancer  diagnosis  using  individual  genes  from  sputum  ranged  from  63-93%  and  42- 
92%  respectively  and  from  plasma  from  33-91%  and  52-94%.  A  three-gene  combination  including  the 
best  individual  genes  has  sensitivity  and  specificity  of  93%  and  79%  using  sputum  and  91%  and  64% 
using  plasma.  Area  under  the  Receiver  Operating  Curve  for  this  panel  was  0.89  95%  Cl  (0.80-0.98)  in 
sputum  and  0.77  95%  Cl  (0.68-0.86)  in  plasma.  Independent,  blinded  random  forest  prediction  models 
combining  gene  methylation  with  age,  pack-year,  COPD  status  and  FVC  values  correctly  predicted 
lung  cancer  in  91%  of  subjects  using  sputum  samples  and  85%  of  subjects  using  blood  samples. 
Conclusions: 

High  diagnostic  accuracy  for  early  stage  lung  cancer  can  be  obtained  using  methylated  promoter 
detection  in  sputum  or  plasma. 
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Background 


Lung  cancer  is  the  third  most  prevalent  cancer  with  over  224,000  |[HJG2]cases  annually  in  the 
U.S.(3,  4)  It  is  the  most  deadly  cancer  worldwide  accounting  for  almost  27%  of  all  cancer -related 
deaths  in  part  because  of  advanced  stage  at  diagnosis  in  67%  of  cases. (3,  5)  The  National  Lung 
Screening  Trial  (NLST)  demonstrated  a  20%  reduction  in  lung  cancer  mortality  using  low-dose 
computed  tomography  (CT)  screening. (6'  This  survival  benefit  comes  at  the  price  of  detecting  many 
indeterminate,  small  pulmonary  nodules  with  a  false  positive  rate  of  96. 4%. (6,  7)  This  has  led  to 
cautious  adoption  of  CT  screening,  because  complications,  and  even  deaths,  result  from  further 
diagnostic  procedures. (8) 

One  approach  to  improving  the  specificity  of  CT  screening  involves  the  use  of  cancer  specific 
biomarkers  from  sputum  and  plasma.  The  epigenetic  alteration  of  promoter  DNA  methylation  is 
associated  with  the  initiation  and  progression  of  cancer,(9-15)  and  may  be  used  as  a  biomarker  for 
cancer  risk,  prevention,  treatment,  and  prognosis(l,  16-25)  However  previous  approaches  had  limited 
sensitivity  and  specificity  and  were  not  adequate  for  lung  cancer  screening. (16-25) 

Reduced  sensitivity  of  methylation  detection  may  occur  from  technical  limitations.  Traditional 
extraction  methods  for  DNA,  such  as  phenol-chloroform,  are  inefficient  for  extracting  small  amounts 
of  DNA  due  to  repeated  sample  transfers  with  sample  loss  and  degradation  of  DNA  during  bisulfite 
conversion. (9,  26)  We  have  developed  Methylation-on-Beads  (MOB)  which  successfully  combines 
these  processes  into  a  single  process,  reducing  sample  loss  with  potentially  increased  sensitivity.  (27- 
29) 

In  addition,  previous  studies  have  selected  genes  for  DNA  methylation  detection  primarily 
chosen  from  a  candidate  approach,  which  are  methylated  in  only  a  fraction  of  tumors.  The  Cancer 
Genome  Atlas  (TCGA)(30)  provides  the  opportunity  to  discover  cancer  specific  methylation  changes 
optimal  for  detection.  We  had  .  reported  the  identification  of  six  genes  (CDOl,  HOXA7,  HOXA9, 
TAC1,  SOX17,  and  ZFP42)  with  a  high  prevalence  of  methylation  changes  present  in  lung  squamous 
and  adenocarcinoma,  but  not  normal  lung  tissue. (1)  These  were  developed  into  sensitive  assays  using 
MOB  and  real-time  Methylation-Specific  PCR  (qMSP)  to  determine  the  diagnostic  accuracy  in  sputum 
and  plasma  for  lung  cancer  detection  in  a  case-control  study. 
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61  Patients  and  Methods 


62  Study  Population 

63  The  study  population  consists  of  a  prospective,  observational  cohort  of  651  participants, 

64  initiated  in  2007  within  the  Johns  Hopkins  Lung  Cancer  Specialized  Program  of  Research  Excellence 

65  (SPORE),  to  monitor  cancer  recurrence  after  surgery.  From  this  cohort,  210  study  patients  had  node 

66  negative  early  stage  tumors  (T1-T2N0)  and  samples  adequate  for  analysis.  Institutional  review  board 

67  approval  was  obtained  prior  to  the  start  of  this  study  (NA_00005998),  and  all  patients  signed  informed 

68  consent.  Surgical  resection  with  curative  intent  and  pathological  analyses  of  suspected  lung  cancer 

69  lesions  were  completed  in  all  patients.  Patients  were  staged  according  to  the  new  revised  TNM 

70  guidelines  classification  criteria.(31)  Cases  were  defined  as  patients  with  confirmed  lung  cancer  by 

71  pathology.  Controls  were  defined  as  patients  histologically  confirmed  not  to  have  cancer.  Plasma  and 

72  sputum  samples  were  obtained  prior  to  surgical  resection.  Pack-years  of  cigarette  smoking  was  defined 

73  as  the  average  number  of  packs  smoked  per  day  multiplied  by  the  number  of  years  of  smoking.  Nodule 

74  size  was  obtained  from  the  pathological  report.  Nodule  volume,  obtained  from  surgical  pathological 

75  reports,  was  calculated  with  the  ellipsoid  volume  formula  (Volume  =  4/3  x  it  x  radius  A  x  radius  B  x 

76  radius  C). 

77  Plasma  and  Sputum  Collection 

78  Prior  to  surgery,  20  ml  of  plasma  was  collected  in  tubes  containing  sodium  heparin  (Bectin 

79  Dickinson,  Franklin  Lakes)  and  then  stored  at  -80°C.  For  sputum  collection,  two  cups  containing 

80  Saccomanno’s  fixative  solution  were  used  for  each  patient  as  previously  described.  (17,  20,  32) 

81  Subjects  were  asked  to  provide  an  early  morning  spontaneous  sputum  at  home  in  two  cups  for  3 

82  consecutive  days  within  1  week  prior  to  pulmonary  resection. (20,  33)  Five  milliliters  of  sputum  was 

83  collected,  washed  with  Saccomanos’  solution,  vortexed,  centrifuged  and  then  stored  at  -80°C.(17) 

84  DNA  Isolation  and  Bisulfite  Conversion 

85  DNA  extraction  from  tumor,  plasma  and  sputum  was  performed  using  MOB,  a  process  that 

86  allows  DNA  extraction  and  bisulfite  conversion  in  a  single  tube  via  the  use  of  silica  super  magnetic 

87  beads. (27)  This  approach  yields  a  1.5  to  5-fold  improvement  in  extraction  efficiency  with  a  small 
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amount  of  DNA  in  comparison  to  traditional  conventional  techniques. (29)  We  have  optimized  the 
protocol  previously  described  for  plasma(29),  using  1.5  ml  of  plasma  and  375  ul  (800units/ml,  NEBL 
p8107s)  of  proteinase  K.  For  DNA  extraction  from  sputum  using  the  MOB  method,  we  modified  the 
protocol  used  for  plasma  by  adding  200  ul  of  sample  to  300  ul  of  Buffer  AL  and  40  ul  of  Proteinase  K 
and  by  incubating  them  together  at  the  same  temperature  (50  °C  for  2  hours).  After  digestion,  300  ul  of 
IPA  and  150  ul  of  beads  were  added.  The  lysate  was  also  incubated  and  rotated  for  10  minutes  before 
adding  5  ul  of  carrier  RNA,  and  incubating  for  an  additional  5  minutes.  (29) 

DNA  Methylation  Analysis 

The  genomic  sequence  for  the  genes  and  1000  bases  upstream  was  obtained  from  the  UCSC  genomic 
browser  website. (34)  The  primers  and  hybridization  probes  for  methylation  analysis  were  designed 
based  on  this  sequence  by  using  Primer3  (v.0.4.0).(35,  36)  All  primer  and  probe  sequences  are  listed  in 
supplementary  Table  SI.  The  analysis  was  performed  using  quantitative  real-time  Methylation 
Specific  PCR,  and  normalized  to  a  control  (3-Actin  assay. (26)  Each  reaction  was  performed  in  a  25  pi 
PCR  mixture  consisting  of  2  pi  of  bisulfite  converted  DNA,  300  nM  R-sense  primer,  300  nM  F-anti- 
sense  primer,  lOOnM  probe,  100  nM  of  fluorescein  reference  dye  (Life  Technologies),  1.67mM  dNTPs 
(VWRQuotation),  and  1  ul  of  Platinum  Taq®  DNA  Polymerase  (invitrogen).  Master  mix  contained 
16.6mM  (NH4)2S04,  67mM  Tris  pH  8.8,  6.7  mM  MgCL  and  lOmM  /?-mercaptoethanol  in  a  nuclease- 
free  DI  water  solution.  Amplification  reactions  were  performed  using  96  well-plates  (MicroAmp®) 
with  all  samples  being  analyzed  in  triplicate.  Thermo  cycling  conditions  were  as  follows:  95°C  for  5 
min,  50  cycles  at  95°C  for  15  seconds,  and  65°C  for  lmin  and  72°C  for  1  min.  An  ABI  StepOnePlus 
Real-Time  PCR  system  was  used  (Applied  Bio  Systems,  examples  shown  in  Supplemental  Figure  1). 

With  the  extremely  low  levels  of  DNA  methylation  in  plasma  and  sputum,  replicates  for  some 
samples  produced  no  detectable  methylation  as  expected.  To  incorporate  this  information  into  the  final 
quantification  of  methylation,  we  calculated  the  2’ACT  for  each  methylation  detection  replicate 
comparing  it  to  the  mean  Ct  for  P-Actin  (ACTB).  For  replicates  which  were  not  detected  (ND),  a  CT 
of  100  was  used,  creating  a  near  zero  value  for  2"  .  The  mean  2"  value  was  calculated  with  the 

formula: 

/•T-icrrepiiccrs  i  i  j-icrrspiiccre  2  .  j- jicrrspuccts  a% 

„  ,-4CT  _  I"  '  +  ~ 

p  &  - 
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Statistical  Analysis 


Quantitative  data  are  expressed  as  median  (interquartile  range)  for  continuous,  non -parametric 
variables  and  frequency  (percentage)  for  categorical  variables.  For  inter-group  comparison,  the 
Wilcoxon  rank  sum  test  was  used  for  continuous  data  and  the  Fisher's  exact  test  for  categorical 
data(HJG3], 

Data  was  analyzed  using  two  approaches.  The  first  approach  is  the  ROC  analysis  using  the  2 
flCT  values  for  individual  genes  to  determine  the  performance  of  each  individual  marker.  The  three  best 
performing  genes  were  selected  for  diagnostic  accuracy  for  lung  cancer  detection,  based  on  receiver 
operator  classification  (ROC)  curves  and  were  used  for  combined  detection.  Sensitivity  and  specificity 
values  were  obtained  from  the  optimum  cutoff  thresholds  from  ROC  curves  (R  statistic  software, 
version  3.0.2,  Vienna,  Austria)/37’  The  area  under  the  curve  was  reported  with  95%  CIs. 

The  second  approach  utilized  independent  blinded  random  forest  prediction  models,  a  non- 
parametric  machine  learning  method,  to  evaluate  the  utility  of  the  six-gene  panel  and  clinical  data  in 
early  lung  cancer  detection.  The  analysis  combined  gene  methylation  with  clinical  risk  factors:  nodule 
size,  age,  pack-year,  COPD  status  and  FVC  values  (Figure  4).  Two-thirds  of  subjects  were  randomly 
selected  as  a  training  set  and  the  remainder  formed  the  test  set.  A  statistician  (PH),  blinded  to  the  true 
diagnosis  codes  of  the  test  set  patients,  used  the  training  set  to  build  three  random  forest  prediction 
models:  the  first  one  used  all  six-gene  sputum  biomarkers  plus  clinical  and  demographic  risk  factors, 
the  second  used  the  six-gene  plasma  biomarkers  and  clinical,  and  a  third  used  only  clinical  and 
demographic  risk  factors  without  any  methylation  biomarkers.  These  three  models  were  ysed  to  predict 
cancer  status  from  the  independent  test  set.  Prediction  accuracy  was  reported  as  the  proportion  of  test 
set  subjects  correctly  predicted  by  the  random  forest  classification  models,  allowing  calculation  of 
sensitivity,  specificity,  and  ROC  analysis. 

Results 

Characteristics  of  the  Patients 

Two  hundred  and  ten  patients  fulfilled  inclusion  criteria,  ,  with  150  node  negative  early  stage 
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lung  cancer  subjects  and  60  controls  with  non-cancerous  lung  lesions  (Table  1).  Clinical  and 
demographic  variables  were  similar  in  cases  and  controls  with  the  exception  of  age,  number  of  pack- 
year  and  nodule  size  (cm)  as  well  as  volume  (cm3).  Subjects  with  lung  cancer  were  significantly  older 
than  controls  (67  vs.  73  years,  p=0.007),  smoked  significantly  more  (30  vs.  19.5  pack  years,  p=0.01), 
and  had  significantly  larger  nodules  (2.0  vs.  1.5  cm,  p=0.01).  The  proportion  of  smokers,  former 
smokers  and  never  smokers  was  not  different  between  cases  and  controls. 


Detection  of  DNA  Methylation 

We  measured  DNA  methylation  for  these  genes  in  tumor  tissue,  confirming  our  previous  study 
suggesting  these  genes  were  methylated  in  the  majority  of  lung  tumors  (Figure  1).  Methylation  in 
sputum  was  detected  more  frequently  in  all  6  genes  in  cancer  patients  compared  to  controls  (Figure 
l|[J4]),  which  for  some  patients  was  quantitatively  similar  to  lung  tumor  tissues,  but  in  some  cases  was 
at  levels  previously  below  conventional  methods  of  detection.  For  5  of  the  6  genes,  (CDOl,  TAC1, 
F10XA7,  SOX17  and  ZFP42)  this  was  statistically  significant  (p  <  0.001).  Methylation  of  all  6  genes 
was  detected  more  frequently  in  plasma  in  cases  compared  to  controls  (p  <  0.001).  The  worst 
performing  gene  was  HOXA9  in  plasma,  which  showed  a  lack  of  specificity  as  was  also  seen  in  the 
sputum.  We  determined  the  sensitivity  and  specificity  in  this  cohort  using  the  presence  or  absence  of 
detectable  methylation  without  considering  the  quantitation  of  methylation.  This  resulted  in  good 
sensitivity  and  specificities  (Table  2a). 

Gene  Methylation  and  Lung  Cancer  Diagnostic  ^ccuracyps] 

ROC  curves  for  lung  cancer  detection  [were  |[AH6] obtained  for  each  single  gene;  using  the 
normalized  methylation  AC,  values  calculated  as  described  in  methods  (Table  2,  ROC  curves  in 
Supplemental  figure  2  &  3).  The  sensitivity  and  specificity  for  lung  cancer  diagnosis  from  single 
methylated  genes  in  sputum  ranged  63-93%  and  42-92%  respectively  and  in  plasma  from  33-91%  and 
52-94%.  The  |AUC|[HJ7]  values  were  0.56-0.89  in  sputum  samples  and  0.60-0.78  in  plasma  samples. 

The  genes  with  the  largest  AUC  in  sputum  were:  TAC1  AUC:  0.84  95%  Cl  (0.74-0.94), 
SOX17  AUC:  0.84  95%  Cl  (0.75-0.94)  and  HOXA7  AUC:  0.77  95%  Cl  (0.67-0.86)  (Figure  2A),  with 
sensitivities  and  specificities  for  TAC1  84%  and  79%;  SOX17  84%  and  88%;  HOXA7  63%  and  92% 
respectively.  The  positive  and  negative  predictive  values  for  these  three  genes  were:  TAC1  94%  and 
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170  57%;  SOX  17  96%  and  59%;  HOXA7  97%  and  40%  respectively. 

171  In  plasma,  the  genes  with  the  largest  areas  under  the  curve  (AUC)  were:  CDOl  AUC:  0.68 

172  95%  Cl  (0.58-0.77),  TAC1  AUC:  0.78  (0.70-0.86)  and  SOX17  AUC:  0.78  95%  Cl  (0.70-0.86)  (Figure 

173  2B),  with  corresponding  sensitivities  and  specificities  of:  CDOl  65%  and  74%;  TAC1  76%  and  78%; 

174  SOX  17  71%  and  86%  respectively.  The  positive  and  negative  predictive  values  for  these  genes  were: 

175  CDOl  86%  and  46%;  TAC1  90%  and  57%;  SOX17  93%  and  54%  respectively. 

176  The  sensitivity  and  specificity  derived  from  the  optimum  cutoff  point  obtained  from  the  ROC 

177  curve  in  the  combination  of  the  three  best  performing  markers  (TACl,  SOX17  and  HOXA7)  in  sputum 

178  was  93%  and  79%,  respectively  with  a  corresponding  ROC  AUC  of  0.89  95%  Cl  (0.80-0.98)  (Figure 

179  2C).  In  plasma,  the  combination  of  CDOl,  TACl  and  SOX17  showed  a  sensitivity,  specificity  and 

180  AUC  of  91%,  64%  and  0.77,  95%  Cl  (0.68-0.86).  respectively  (Figure  2D). 

181  Smokers  subset  analysis 

182  Since  CT  screening  for  lung  cancer  is  currently  recommended  for  current  and  ex-smokers,  we 

183  explored  the  diagnostic  accuracy  when  only  smokers  were  considered  (n=155;  114  with  cancer  and  41 

184  without  cancer)  (Supplemental  Tables  for  Only  Smokers  S2).  The  results  in  only  smokers  were 

185  similar  to  the  entire  study  population  for  the  prevalence  of  methylated  patients,  sensitivity,  specificity 

186  and  AUC  (Supplemental  Table  S3).  AUC  in  smokers  only  was  0.89  95%  Cl  (0.79-0.99)  for  the 

187  combination  of  the  methylation  status  of  the  best  three  genes  from  sputum  and  AUC  0.85  95%  Cl 

188  (0.76-0.94)  from  the  best  three  genes  from  plasma  (Supplemental  Table  S4  &  S5). 

189  Independent  Prediction  Accuracy  Performance 

190  While  the  above  analysis  looked  at  individual  gene  methylation  in  cases  and  controls  to  detect 

191  cancer,  independent  blinded  random  forest  prediction  models  analyzed  all  these  biomarkers  in 

192  combination  with  clinical  risk  factors.  Risk  factors  included  in  the  first  two  random  forest  prediction 

193  models  were  methylation  Ct  values  from  all  six  genes,  age,  pack-year,  COPD  status  and  FVC  values. 

194  The  methylation  Ct  values  were  not  included  in  the  last  prediction  model.  The  randomly  selected 

195  training  dataset  has  140  subjects  with  99  (70.7%)  cancers  and  41  (29.3%)  controls.  The  independent 

196  test  set  has  70  subjects  with  51  (72.9%)  cancers  and  19  (27.1%)  controls.  In  the  variable  of  importance 

197  output  of  the  first  two  random  forest  prediction  models,  methylation  Ct  values  were  ranked  as  more 
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important  variables  than  demographic  and  clinical  variables  (Figure  3  and  4).  Table  3  summarizes  the 
prediction  accuracies  of  these  three[HJ8]  models  when  they  were  applied  to  the  independent  test  set 
patients.  With  sputum  samples,  the  random  forest  model  correctly  predicted  lung  cancer  in  91%  of 
subjects  in  the  test  subset.  The  corresponding  AUC  was  0.85  95%  Cl  (0.59-1.0)  (Figure  3).  The 
sensitivity  and  specificity  of  the  prediction  in  the  testing  subset  from  the  ROC  curve  were  0.93  and 
0.86,  respectively.  Using  plasma  samples,  the  random  forest  model  correctly  predicted  lung  cancer  in 
85%  of  subjects  in  the  testing  subset.  The  corresponding  AUC  was  0.89  95%  Cl  (0.79-0.99)  (Figure 
3).  The  sensitivity  and  specificity  of  the  prediction  in  the  testing  subset  from  the  ROC  curve  were  0.93 
and  0.67,  respectively.  Using  clinical  and  demographic  risk  factors  alone,  the  accuracies  were  lower 
than  the  first  two  models  with  a  diagnostic  accuracy  of  68%,  AUC  of  0.64,  PPV  of  75%  and  a  NPV  of 
38%  (Figure  3  and  Table  3). 

Discussion 

High  diagnostic  accuracy  for  early  stage  lung  cancer  can  be  obtained  using  a  panel  of 
methylated  promoter  genes  in  sputum  or  plasma,  and  an  ultrasensitive  detection  strategy  based  on 
MOB.  This  assay  has  several  characteristics  which  make  it  clinically  useful  (i)  it  has  a  sensitivity  and 
specificity  in  sputum  and  plasma  which  exceeds  the  diagnostic  accuracy  required  by  most  clinical 
standards(12,  38)  (ii)  it  can  be  performed  with  minute  quantities  of  DNA  from  sputum  or  plasma  (iii)  it 
can  be  used  to  distinguish  malignant  versus  benign  CT  detected  nodules,  addressing  the  current 
problem  of  high  false  positive  CT  findings  in  lung  cancer  screening.  This  discrimination  is  associated 
with  risk  of  lung  cancer  independent  of  age,  pack-year  and  nodule  size,  and  is  can  detect  early  stage 
lung  cancer  in  smokers.  Finally,  as  a  PCR-based  assay,  it  is  simple  and  relatively  inexpensive. 

Previous  studies  have  sought  to  improve  lung  cancer  risk  assessment  by  the  use  of  molecular 
biomarkers  obtained  from  blood  and  sputum.(17,  19,  20,  32,  33,  39,  40)  However  none  of  these  tests 
have  been  used  clinically  because  their  achieved  sensitivities  and  specificities  were  usually  not  high 
enough  for  clinical  decision-making. (17,  19,  20,  32,  33,  39-41)  With  improvements  in  DNA  extraction 
methods  and  processing  for  methylation  detection,  along  with  the  use  of  highly  prevalent  cancer 
specific  methylation  targets,  we  have  overcome  these  obstacles. 

In  this  study,  detection  of  methylation  in  sputum  samples  was  slightly  better  than  the  detection 
of  these  same  genes  in  plasma.  The  access  of  early  cancers  to  the  airways  may  be  one  explanation  for 
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227  this  difference.  Indeed,  changes  in  the  airways  form  the  basis  for  the  AEGIS  Study,  which  reported  an 

228  improved  diagnostic  yield  of  bronchoscopy  using  gene-expression  classifiers  from  epithelial  cells 

229  collected  during  bronchoscopy. 1411  The  AUC,  sensitivities  and  specificities  reported  in  the  AEGIS 

230  Study  were  lower  than  the  ones  in  the  present  study. 

23 1  In  our  model  where  methylation  markers  from  blood  were  considered  simultaneously  with  age 

232  and  number  of  pack-years,  we  observed  a  predictive  accuracy  close  to  that  of  sputum.  This  suggests 

233  that  blood  could  substitute  for  sputum  in  lung  cancer  detection  in  those  cases  where  sputum  cannot  be 

234  obtained. 

235  According  to  the  NLST,  the  chances  of  having  lung  cancer  with  a  positive  CT  screening  are 

236  less  than  5%. (6,  7)  This  is  because  lung  cancer  with  CT  screening  in  the  NLST  study  yielded  a  71% 

237  sensitivity  but  a  63%  specificity  with  a  96.4%  false  positive  rate. (6,  7)  Our  current  findings  indicate 

238  that  methylation  detection  using  a  few  genes  from  blood  and  sputum  could  potentially  reduce  false 

239  positive  screening.  Although  our  study  included  patients  who  would  not  meet  current  lung  cancer 

240  screening  guidelines,  we  observed  similar  detection  rates  when  only  smokers  were  analyzed. 

241  Replication  and  external  validation  of  our  findings  in  a  large,  prospective,  multicenter  case  control  trial 

242  are  essential  before  this  approach  can  be  adopted. 

243  Conclusion 

244  This  study  shows  that  it  is  possible  to  obtain  high  sensitivity  and  specificity  detection  of  early 

245  stage  NSCLC  using  a  panel  of  methylated  promoter  genes  in  plasma  and  sputum,  and  that  the 

246  methylation  level  of  these  genes  is  associated  with  a  high  lung  cancer  risk  independent  of  age,  pack- 

247  year  and  nodule  size.  These  epigenetic  biomarkers  could  be  used  as  an  adjunct  to  CT  screening  to 

248  identify  patients  at  high  risk  for  lung  cancer,  reducing  false  positive  results,  unnecessary  tests,  as  well 

249  as  improving  the  diagnosis  of  lung  cancer  at  an  earlier  stage. 
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Figures 


Figure  1.  Methylation  detection  values  of  the  studied  genes. 

This  scatter  plot  shows  the  converted  ACT  methylation  values  in  a  logarithmic  scale.  These  values 
show  a  bimodal  distribution  with  the  lower  group  the  values  corresponding  to  those  samples  with  no 
detectable  amplification  (ND).  The  majority  of  lung  tumor  samples  have  high  levels  of  methylation,  as 
expected  from  the  previous  study.  Blood  and  sputum  samples  from  cancer  patients  have  detectable 
methylation  which  varies  from  levels  nearing  that  of  tumor  samples  to  those  at  the  limits  of  detection 
(10"5-10"6),  while  some  patients  are  undetectable.  The  majority  of  controls  have  undetectable 
methylation  at  these  loci,  although  some  patients  do  have  detectable  methylation  that  is  quantitatively 
similar  to  cancer  patients.  HOXA9  methylation  is  detectable  in  most  control  patients,  especially  in  the 
sputum,  suggesting  this  change  is  present  in  the  lung  epithelium  and  not  as  specific  for  the  detection  of 
^ancer|[HJG9]. 
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Figure  2.  Receiver  operator  classification  curves  for  lung  cancer 
detection. 


A.  ROC  curves  comparing  the  3  genes  with  the  largest  areas  under  the  curve  for  sputum.  B.  ROC 


curves  comparing  the  3  genes  with  the  largest  areas  under  the  curve  for  blood.  C.  ROC  of  the 


combined  methylation  status  of  the  genes  from  sputum  with  the  largest  area  under  the  curve.  D.  ROC 


of  the  combined  methylation  status  of  the  genes  from  blood  with  the  largest  area  under  the  curve. 


TAC1,  HOXA7,  &  SOX17  Sputum 


0.0  0.2  0.4  0.6  0.8  1.0 

False  positive  rate 


Abbreviations :  area  under  the  curve:  AUC,  95  %  confidence  interval:  95%  CL 
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Figure  3.  Receiver  operator  classification  curves  for  cancer  predictions. 


ROC  curves  assessing  the  accuracy  of  the  predictions  for  lung  cancer  performed  on  the  testing  subset 
by  using  as  predictors  the  A Ct  values  for  all  six  genes,  age,  pack-year,  COPD  status  and  FVC  values. 
The  left  plot  is  obtained  using  sputum  samples,  the  middle  one  using  blood  samples  and  the  right  one 
the  ROC  curve  for  the  clinical  predictors  alone. 


Sputum  prediction 


0.0  0.2  0.4  0.6  as  1.0 


False  positive  rate 


Blood  prediction  Clinical  variables  prediction 


0.0  02  0.4  0.6  0.8  1.0  0.0  02  0.4  0.6  0J  1.0 

False  positive  rate  False  positive  rate 
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Figure  4.  Variable  importance  plot  for  random  forest  ^redictionpio] 


The  plot  details  the  relative  importance  of  each  of  the  variables  to  the  model’s  accuracy  (including: 
methylation  p  2’ACT  values,  nodule  size,  age,  pack-year,  COPD  status  and  FVC  values).  The  x-axis  is 
the  mean  decrease  in  the  Gini  co-efficient  that  results  when  that  variable  is  included  in  the  model.  The 
Gini  coefficient  is  a  measure  of  inequality  among  the  trees  in  the  random  forest,  and  in  this  case 
represents  the  performance  of  the  random  forest  model  with  and  without  a  variable  included.  Those 
variables  that  have  the  highest  decrease  in  the  Gini  coefficient  were  most  likely  to  create  consensus 
among  the  individual  decision  trees  used  in  the  model  (or  reduce  inequality)  when  included  in  the 
model.  These  variables  are  therefore  most  predictive  of  the  outcome  of  the  model  overall.  Those 
variables  with  a  small  decrease  in  the  mean  Gini  coefficient  are  relatively  less  important  to  the 
prediction  made  by  the  random  forest  model. 
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Tables 


Table  1.  Baseline  Characteristics  of  the  210  Subjects. 


Patient  Characteristics 

Cancer 

(N=150) 

Control 

(N=60) 

p  Value 

Age  at  surgery  (years)  (IQR) 

68  (62-75) 

63  (55-73) 

0.007 

Gender 

Male  (%) 

63  (42%) 

33  (55%) 

0.094 

Female  (%) 

87  (58%) 

27  (45%) 

Race 

White  (%) 

120  (80%) 

51  (85%) 

Black  (%) 

19  (13%) 

3  (5%) 

0.087 

Other  (%) 

11  (7%) 

6  (10%) 

Stage 

IA-IB  (%) 

136  (91%) 

NA 

NA 

IIA  (%) 

14  (9%) 

NA 

Histology 

Adenocarcinoma  (%) 

121  (81%) 

NA 

Squamous-cell  (%) 

26  (17%) 

NA 

NA 

Adenosquamous  (%) 

3  (2%) 

NA 

Smoking  status 

Current  (%) 

27  (18%) 

7  (12%) 

Former  (%) 

87  (58%) 

34  (57%) 

0.176 

Never  (%) 

31  (21%) 

19  (32%) 

Pack-year  (IQR) 

30  (10-50) 

20  (0-35) 

0.010 

COPD  (%) 

41  (27%) 

12  (20%) 

0.370 

FEV1  %  Predicted  (IQR) 

84  (70-99) 

85  (70-100) 

0.861 

FVC  %  Predicted  (IQR) 

92  (80-103) 

87  (80-110) 

0.682 

FEV1/FVC  %  Ratio  (IQR) 

73  (68-78) 

77  (70-79) 

0.080 

Nodule  size  (cm) 

2  (1.5-3) 

1.5  (1.1-3) 

0.01 

<  1cm 

6  (4%) 

13  (22%) 

0.001 
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1-2  cm 

>  2  cm 

Nodule  volume  (cm3) 

52  (35%) 

92  (61%) 

4.19(1.77-14-14) 

19  (32%) 

28  (47%) 

1.6  (0.52-18.12) 

0.001 

Abbreviations :  Chronic  obstructive  pulmonary  disease:  COPD,  Forced  Expiratory  Volume  in  one 
second:  FEV1,  Forced  vital  capacity:  FVC,  Interquartile  range:  IQR. 

Nodule  size  %  <lcm,  1-2,  >2cm 
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Table  2A  Gene  Methylation  Detection,  Sensitivity,  Specificity  Using 
Detectable  vs.  Non-detectable  cutoff  from  Figure  1. 


Cancer  (n=90)  Control  (n=24) 


Sputum 

n 

Sensitivity 

n 

Specificity 

PPV 

NPV 

CDOl 

70 

78% 

8 

67% 

90% 

44% 

TAC1 

77 

86% 

6 

75% 

93% 

58% 

HOXA7 

57 

63% 

2 

92% 

97% 

40% 

HOXA9 

84 

93% 

22 

8% 

79% 

25% 

SOX  17 

76 

84% 

3 

88% 

96% 

60% 

ZFP42 

78 

87% 

9 

63% 

90% 

56% 

TAC1,  HOXA7,  SOX  17 

88 

98% 

7 

71% 

93% 

89% 

Cancer (n=125) 

Control  (n=50) 

Plasma 

n 

Sensitivity 

n 

Specificity 

CDOl 

81 

65% 

13 

74% 

86% 

46% 

TAC1 

95 

76% 

11 

78% 

90% 

57% 

HOXA7 

42 

34% 

4 

92% 

91% 

36% 

HOXA9 

108 

86% 

27 

46% 

80% 

58% 

SOX  17 

91 

73% 

8 

84% 

92% 

55% 

ZFP42 

105 

84% 

23 

54% 

82% 

57% 

CDOl,  TAC1,  SOX  17 

116 

93% 

19 

62% 

86% 

78% 

Table  2B.  Gene  Methylation  Sensitivity,  Specificity,  at  optimal  cutoffs 
with  AUC  and  Association  with  Cancer  Diagnosis  using  Sputum  and 
Plasma. 


Sputum 

Sensitivity 

Specificity 

PPV 

NPV 

AUC 

95%  Cl 

CDOl 

78% 

67% 

90% 

45% 

0.70 

(0.57  -  0.84) 

TAC1 

84% 

79% 

94% 

57% 

0.84 

(0.74  -  0.94) 

HOXA7 

63% 

92% 

97% 

40% 

0.77 

(0.67  -  0.86) 

HOXA9 

77% 

42% 

83% 

32% 

0.56 

(0.41  -  0.69) 

SOX  17 

84% 

88% 

96% 

59% 

0.84 

(0.75  -0.94) 

ZFP42 

88% 

62% 

90% 

58% 

0.73 

(0.60  -  0.87) 

TAC1,  HOXA7,  SOX  17 

93% 

79% 

94% 

75% 

0.89 

(0.80  -  0.98) 

Plasma 

Sensitivity 

Specificity 

PPV 

NPV 

AUC 

95%  Cl 

CDOl 

65% 

74% 

86% 

46% 

0.68 

(0.58  -  0.77) 

TAC1 

76% 

78% 

90% 

57% 

0.78 

(0.70  -  0.86) 
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HOXA7 

33% 

94% 

93% 

36% 

0.60 

(0.51-0.69) 

HOXA9 

81% 

52% 

81% 

52% 

0.62 

(0.52  -  0.73) 

SOX  17 

71% 

86% 

93% 

54% 

0.78 

(0.70  -  0.86) 

ZFP42 

81% 

58% 

83% 

55% 

0.66 

(0.56  -  0.75) 

CD01,  TAC1,  SOX  17 

91% 

64% 

86% 

74% 

0.77 

(0.68  -  0.86) 

Abbreviations:  area  under  the  curve  (in  the  ROC  curves):  AUC,  95  %  confidence  interval:  95%  CL 
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Table  3.  Performance  for  lung  cancer  diagnosis  of  the  independent  blinded  random  forest 
prediction  models  on  the  testing  subset 


Sensitivity 

Specificity 

PPV 

NPV 

AUC 

95%  Cl 

Prediction  from  Sputum 

93% 

86% 

96% 

75% 

0.85 

0.59-1 

Prediction  from  Blood 

93% 

67% 

87% 

80% 

0.89 

0.79-0.99 

Clinical  Predictors  alone 

84% 

26% 

75% 

38% 

0.64 

0.50-0.78 
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